Trillion-LLaVA-7B is a vision-language model with image understanding capabilities, trained on English visual-language instruction pairs, demonstrating exceptional cross-lingual visual reasoning abilities.
Text-to-Image
Transformers Supports Multiple Languages